Data Expo 2008 «Airline on-time performance»

by Mostafa Abobakr

Investigation Overview

We will use the present dataset to gain insights that could help make improvements against the flights delaying's, or to make backed findings about the best carriers with less delaying's.

Dataset Overview

This dataset consisting of 7,009,724 rows or data points after removing 4 duplicated points, reports flights in the United States, including carriers, arrival and departure delays, and reasons for delays, during year 2008. I reduced the dataset from 29 to 19 to be 9 columns or features eventually, and I got the carrier names instead of there codes from an other file called carriers.csv. I exported the columns to be worked with into 2008_flights.csv after some structuring with SQL, then I came back again to jupyter notebook to complete the work.

Data Expo 2008 «Airline on-time performance»

Advanced Data Analysis Track «Communicate Data Findings» Project

❯ Investigation points

❯ Conclusions</h3></font>